Antipattern: Pattern Matching Predicates

Let’s look at how we can use LIKE and REGEXP predicates for searching from a large collection.

SQL provides pattern-matching predicates for comparing strings, and this is the first solution most programmers use when searching for keywords.

Using pattern matching LIKE predicates#

The most widely supported of these is the LIKE predicate. The LIKE predicate supports a wildcard (%) that matches zero or more characters. Using this wildcard before and after a keyword matches any string that contains that word. The first wildcard matches any text preceding the word, and the second wildcard matches any text following the word.

Retrieving data using LIKE predicate in standard SQL

Using regular expressions for pattern matching#

Regular expressions are also supported by many database brands, although not in a standard way. You don’t need wildcards, because conventionally regular expressions match the pattern against any substring anyway. Here’s an example using MySQL’s regular expression predicate:

Retrieving data using REGEXP predicate in MySQL

Limitations of using pattern matching#

The most important disadvantage of pattern matching operators is that they have poor performance. They can’t benefit from a conventional index, so they must scan every row in a table. Since matching a pattern against a string column is an expensive operation (relative to, for instance, comparing two integers for equality), the total cost of a table scan for this search is very high.

The second problem of simple pattern matching using LIKE or regular expressions is that it can find un intended matches.

Code showing unintended results while pattern matching

The previous example matches text that contains the word “one”, but it also matches the strings “money”, “prone”, “lonely”, and so on. Searching for a pattern with the keyword delimited by spaces doesn’t match occurrences of the word with punctuation or at the start or end of the text. The regular expressions supported by our database might support a special pattern for a word boundary to solve this issue:

Retrieving data using REGEXP for search in MySQL

The query is executed successfully, and there are no results returned. It means that the expression did not return false matches.

Given the problems of performance and scalability and the gymnastics we have to perform to prevent irrelevant matches, simple pattern matching is a poor technique for searching for keywords.

Synopsis: Poor Man’s Search Engine
Solution: Use the Right Tool for the Job
Mark as Completed
Report an Issue